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to determine the performance of the algorithms on low signal to noise ratio images. 
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I. INTRODUCTION 



A. GENERAL 

Determining the relative motion between an observer and his environment is a 
major problem in computer vision. Its applications include mobile robot navigation and 
the monitoring of dynamic processes. Motion estimation also has many applications in 
image processing, such as coding, restoration, and reducing the noise by temporal 
filtering. 

In computer vision, motion in images is recovered from a time ordered sequence 
of images. The relative motion between the objects in a scene and a camera, gives rise 
to the apparent motion of the objects in a sequence of images. This motion may be 
characterized by observing the apparent motion of a discrete set of features or brightness 
patterns in the images. The objective of motion analysis is the derivation of the motion 
of the objects in the scene through the analysis of motion features or brightness patterns 
associated with objects in the sequence of images. 

A closely related subject to motion estimation is the estimation of structure of the 
imaged scene. Although structure may be computed independent of motion via stereo 
vision, knowledge of the object motion can facilitate establishment of feature 
correspondences within a pair of stereo images, thus aiding the determination of 
structure. Indeed, psychological researchers have shown that apparent motion is a clue 
used by the human visual system for computing the scene structure [Ref. 1]. This close 
relationship between the estimation of structure and the estimation of motion has 
combined these two problems. 
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In the following sections, the fundamental principles of several approaches for the 
estimation of motion and structure will be discussed. 

B. METHODOLX)GIES FOR MOTION ESTIMATION 

Two distinct approaches have been developed for the computation of motion: 

1. Methods Based on the Use of Image Position 

This method is based on extracting a set of relatively sparse, but highly 
discriminatory, two-dimensional features in the images corresponding to three- 
dimensional object features, such as comers or occluding boundaries of surfaces. Such 
points, lines and/or curves are extracted from each frame and then inter-frame 
correspondence is established between these features. The observed displacement of the 
2D image features are used to solve the resulting motion equations. Constraints are 
formulated based on a rigid body motion assumption. This method assumes that 
correspondence is available between features extracted from one image in a sequence of 
images and those extracted from the next image. Establishing and maintaining such 
correspondence is a very hard problem. The ambiguity is produced by the effects of 
occlusion and noise which cause features to appear or disappear and also give rise to 
false features. Some of the techniques developed for solving the corresponding problem 
for stereo vision and optical flow algorithms may be applied to this problem [Ref. 1]. 

Analysis algorithms that rely on feature correspondence are termed feature- 
based algorithms. The feature-based algorithms will be explained in Chapter III in more 
detail. First projection methods, then the motion analysis algorithms [Ref. 2], [Ref. 3] 
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based on these projection methods will be explained. An analysis of motion parameter 
error caused by correspondence errors will also be presented in chapter III. 

2. Methods Based On Optical Flow 

Optical flow is the two-dimensional field of instantaneous velocities of 
brightness values (gray levels) in the image plane. The optical flow techniques mainly 
rely on local spatial and temporal derivatives of image brightness values. There are 
several methods which have been proposed for computing optical flow of time-varying 
images. These methods are mainly based on: 

a. Local features 

The first step of this method is the detection of local features in each 
image. After obtaining the features a pair-wise matching between corresponding features 
in two frames must be obtained that minimizes an appropriate cost function. 

b. Spatitemporal gradient 

This method uses the first order spatial and temporal differentials of time 
varying images to estimate at each image point the component of motion in the direction 
of maximally increasing gray-scale intensity. This method is based on the equation: 



hx 5y 6t 



( 1 . 1 ) 



where f is the image function, t is time, S is the partial derivative operator, u and v are 
the X and y components of the optical velocity [Ref. 4]. 

Equation 1.1 alone is not sufficient to determine the optical velocities. 
Some constraints must be imposed, for example: 
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1. Optical flow is smooth. 

2. Optical flow is constant and continuous over entire segments in image, 

3. Motion is restricted (e.g., planar motion). 

Using these constraints, some solutions are proposed [Ref. 5], [Ref. 6]. 
c. Fourier Phase Approach 

This method uses the shift property of the Fourier transform, and is the 
most appropriate method for determining the motion of a single object moving across a 
uniform background [Ref. 4]. This method is restrictive because only a single non- 
localized velocity vector is obtained for each image frame. 

The computation of optical flow requires the evaluation of first and 
second partial derivatives of image brightness values and also of the optical flow. The 
real images are noisy, in general. The evaluation of derivatives is a noise enhancing 
operation. As we increase the order of the derivative, we increase the sensitivity of 
noise. Also because of occlusion there may be some discontinuities in the optical flow, 
these regions must be detected reliably, otherwise the continuity assumption is violated. 

A new Fourier phase approach proposed by Burl [Ref. 7] based on the 
extended Kalman filter is discussed in Chapter II. The extended Kalman filter is 
implemented in spatial frequency domain and provides an estimate of the object velocity. 
Accumulative difference and binary image approaches are discussed in Chapter IV. 
Chapter V compares the performance of the algorithms for low signal to noise ratio 
(SNR) images. A discussion about the results of the computer simulations will also be 
presented at that chapter. 
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n. EXTENDED KALMAN FILTER ALGORITHM 



A. GENERAL 

An extended Kalman filter (EKF) is used to estimate the velocity of an object 
moving across an image frame and to reduce the effect of noise. The algorithm is 
proposed by Burl [Ref. 7] and the image sequences are modeled by dynamic nonlinear 
state equations. A simple dynamic model consists of a shift operator with the shift given 
by the velocity times the sample time. This model is given in both the spatial and spatial 
frequency domains. Application of EKF in the spatial domain requires a prohibitive 
number of calculations but implementation of the EKF in the spatial frequency domain 
reduces the number of calculations by a great amount. The resulting algorithm, referred 
to as the parallel extended Kalman filter (PEKF), is developed in detail and a number of 
practical considerations are presented in the Reference 7. As the velocity estimation 
error approaches zero, the EKF is shown to converge to a parallel set of third-order 
extended Kalman filters. This structure is referred to as the modified extended Kalman 
filter (MEKF). The single frequency EKFs are combined to yield an estimate of the 
velocity of the moving object using weighted least square algorithm. An ambiguity of 
multiplies of 2 -k in the frequency-velocity estimates is addressed in Reference 7. A 
proposed a two-step algorithm to solve the ambiguity utilizing some structures of the 
problem is also given in Reference 7. First, the object velocity is estimated using the 
subset of the spatial frequencies where there is no ambiguity, then this velocity estimation 
is used to estimate the ambiguities for other spatial frequencies. Another approaches to 
solve the ambiguity problem will be presented in Section II. C using the properties of 
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EKF and the problem structure. An outline of the EKF algorithm is given in Section 
II. B. The simulation results of the proposed algorithm with varying SNR images is given 
in Section II. D. 

B. OUTLINE OF THE EKF ALGORITHM 

In the spatial domain, successive image frames composed of N by N pixels and 
containing a moving object is modeled by a nonlinear state space equation: 



x(it+l) = 



y(k+lX 



S(y) 0 
0 I 






njik) 

«v(*) 



( 2 . 1 ) 



where x(k) is the state of the system, f(k) E is composed of amplitudes of each 
pixel at time k stacked into a vector, v E 7?^ is the velocity vector in pixels per sampling 
time, and S(v) is a two-dimensional shift operator with the magnitude and direction being 
a function of the velocity vector, and n/k) and n^(k) are the image noise and the velocity 
noise, respectively. 

The corresponding measurement equation is: 



y(J^) =A ^) + V (k) =[/ 0}x(A:) + V (k), (2 • 2) 

where y E R^^ is the measured image and u E R^^ is a zero mean, white, Gaussian 
random process with a covariance matrix R„=oU. 

The spatial domain model described by Equations 2. 1 and 2.2 is transformed to the 
spatial frequency domain by using the two-dimensional discrete Fourier transform. The 
state equation in the spatial frequency domain is then defined: 
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(2.3) 





Z>(v) 


0 


F(k) 


± 


Zp(k) 


7(^+1). 


i 0 


I 


V(k) 




Zyik) 



where F(k) is the fourier transform of f(k), D(v) is the transformed shift operator for a 
specific spatial frequency, and Zp(k) is the transformed image plant noise, Zy(k) is the 
transformed velocity noise. 

The corresponding measurement equation then becomes: 

Y{k)=F(k)+N(Jc)=[I Q])^(k)*N(Jc), (2 4) 

where Y(k) and N(k) are the Fourier transform of y(k) and v(k), respectively. 

The EKF uses the spatial frequency domain state equations. Equations 2.3 and 2.4. 
The linearized state equation is given as, 

bX(k^\)=kiX^bX{k)^Z(k), (2.5) 

where Xq is the current estimate and hX(k) = X(k) - Xq, A(Xo) is the Jacobian of the 
nonlinear state equations about X^. 

A summary of EKF structure is given below from References 8 and 9: 

Stare estimation: 



X(it+l|Jt)= fl(X(Jk|Jt),0); (2-6) 

Covariance estimation: 

P(Jt+l lit)=A(X(itli(:))P(Jt|Jfc)A^(X(Jl:|it))+(?^; (2.V) 

Innovation: 
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e(k\k) = y()t+i)-y(;t+i|it) = nk+i)-cx(k+i\ky. 



( 2 . 8 ) 



Kalman Gain: 



G(it+1) = P(k+l\k)C^C P(k+l\k) (2-9) 

State correction: 



X{k+1 |Jt+l)=X(A:+l |Jt)+G(A:+l)[y(A:+l)-y(it+l (2.10) 

Covariance correction: 



P(,k+l\k+l) = [ I-G(k+l)QP(k+l\k). (2-H) 

The parallel extended Kalman filter structure is given in Figure 2.1. Reference 7 
outlines how the EKF converges to the MEKF as the velocity estimation error approaches 
zero. 



The state vector for a specific spatial frequency is defined: 





x,,(k) 




ReiF-m 


X,ik) = 




= 


Im(F.(k) 




X,#) 




T 

Q,V 



( 2 . 12 ) 



where Xji{k) and Xt 2 (k) are the Fourier coefficients of a specific spatial frequency and ATy 
is the frequency-velocity product. 
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C. VELOCITY ESTIMATION 
1. Weighted Least Square 

In Equation 2.12, is the estimated frequency-velocity product. These 
estimates can be combined to obtain the estimate of the velocity of the moving object. 
This estimation algorithm is complicated by an ambiguity of multiples of 2x. The 
frequency-velocity product estimates can be modelled as, 

X^(k\k)=ilv+w+2nm, (2-13) 

where X^(k | k) is the estimated frequency-velocity product, fi contains the spatial 
frequencies, and w is a zero mean, random vector, with the estimated covariance: 
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S =E[ww ^ = Jiflg[Pj 33 ^ 233 »- 

The subscript /33 denotes the [3,3] component of the state estimation error covariance 
matrix of the i\h single frequency EKF. The vector m models the ambiguity whose 
elements are integer and constrained to be: 

I'Wil ^ 

since the object is constrained to remain within the image. 

The estimated | k) can be used to estimate the velocity that minimizes 
the weighted sum of the square error: 

J=(X^(k\k)-Civ-2nm)^'Z'\x^(k\k)-Clv-2Ttm). (2.16) 

The solution for v can be found by setting the gradient of J with respect to v to zero: 

— =Q^2-’£5v+Q^2-'(27cw)-Q^2-'X3(/:|/:)=0. (2.17) 

dv 

From Equation 2.17: 

v=(Q^2-‘fl)-'(Q^2-'X3(Jt|/:)-fl^2-‘(27:m)). (2-18) 

The search for optimal vector m can be simplified by utilizing some conditions of the 
problem as explained in Reference 7. From these conditions it is found that the /n,s are 
equal to zero for almost all spatial frequencies with magnitude given: 
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(2.19) 



11(^11 < 



ItiN 



This subset of spatial frequencies can be used to estimate the unambiguous velocity. 
Then this velocity can be used to estimate the /?i,s for spatial frequencies that do not meet 
the condition in Equation 2.19. The solution for vector m can be found by equating the 
gradient of J in Equation 2.16 with respect to m to zero: 



-=2Ttm->-Clv-XJk\k)=0. (2.20) 

dm 



Using Equation 2.19: 



m.=i?o«/Kf(^[X3(it|it)-o>fv]), (2.21) 

where Round(-) equals to nearest integer. Then Equation 2.18 can be used to estimate 
the V vector that minimizes the Equation 2. 16. The frequency-velocity product estimates 
are fed back to single frequency EKFs. Once the filter has converged, the values of m 
will all be equal to zero. 

2. Constrained Least Square 

Another approach to estimating the object velocity vector from Equation 2.13 
is the constrained least squares method [Ref. 10]. We can select one of the frequency 
component satisfying Equation 2.19 denoted by C, and the corresponding estimated 
frequency-velocity product denoted by d, as a constraint equation. Then the linear 
constraint equation becomes: 
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Cv = d. 



( 2 . 22 ) 



The problem now becomes: 

min(X 3 (^ 1^) - ft v)^(Xj(^ | A:) - ft v) 

V (2.23) 

Subject to Cv = d. 

From Equation 2.23 we can form the following Lagrangian 7: 

J=Ux^(k lit) - ft v)^(X 3 (it lit) - ft v) - A(Cv-d). (2.24) 

The solution for v can be found by equating the gradient of J with respect to v to zero: 

—=QjQiv-QJXJk\k)-XC'^=0. (2.25) 

dv 

The solution for v then becomes: 

v=(ft^ft)-‘ft^X3(it|it)+(ft^ft)''C^X. (2.26) 

We can write this equation as, 

Vc„=v„.(Q'-Q)-'C''X, (2-27) 

where Vcls denotes constraint least square solution, and denotes the least square 
solution. We can solve for X by invoking the constraint defined in Equation 2.22: 

Cvc^=Cv^+C(ft^ft)-‘c^X=J. (2.28) 

From Equation 2.28: 
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X=[C(Q^fl)-‘c^]''(if-Cv^). (2.29) 

Substituting Equation 2.29 into 2.27, we obtain the constraint least square solution: 



Vcz5=Vz5-^(ti''«)'’c1C(Q^Q)-‘c^]-‘(d-CvJ. (2-30) 

3. Adding Fictitious Noise 

Since the ambiguity is not modeled in the single frequency EKFs, the system 
may be assumed to have a modeling problem. This problem may cause the true 
estimation errors to become infinite if we don’t process the frequency-velocity estimates 
by one of the methods explained previously. Another way of solving this problem is to 
add fictitious process noise to the system as explained in References 8 and 9. 

The true frequency-velocity estimations are: 



■ ^^true- 



(2.31) 



We can add fi/t, where ^ is a 2 by 1 constant vector, to the frequency-velocity estimates 
at every step. The velocity estimate then consists of the true velocity plus a bias. This 
bias is eliminated easily after the system has converged by using the fact that velocity of 
the object is multiple of integers since the velocity is assumed to be pixel/frame. If the 
elements of vector n is positive and the object velocity is negative then the algorithm will 
estimate the object velocity as true velocity plus N, where N is the size of image since 
the object motion is assumed to be periodic. Using that method eliminates the maximum 
velocity constraint which is imposed from Equation 2.15. 
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D. SIMULATION 



The velocity estimation algorithms are simulated using varying signal to noise ratio 
(SNR) images. A computer generated square object has been moved on a 16 by 16 
image. Artificial noise is added to the image for each experiment. The SNR of each 
image is calculated as signal power over noise power in dB units. Relative translation 
vector error is obtained as the norm of error over the norm of true vector for each 
iteration except the first one, since it is assumed the initial velocity estimate is zero. 
Figure 2.2 shows the result of experiments that have been performed using 6 iterations 
for each SNR and using a positive velocity and a positive fi vector for the fictitious noise 
algorithm. The two-step weighted least square algorithm needs more iterations than 6 
to converge to the true velocity for low SNR images [Ref. 11]. The relative errors of 
this algorithm for low SNR are larger but they are smaller for high SNR images. The 
adding fictitious noise algorithm converges to the true velocity faster, since the added 
fictitious noise is in the shape of the expected value of true frequency- velocity product 
and the relative error becomes smaller for this algorithm. 

A second experiment is performed to indicate how the EKF algorithm enhances the 
image quality. The SNR of the input image and the SNR of the image estimated (output) 
by EKFs are compared. Figure 2.3 shows that for SNRs less than 10 dB there is an 
improvement in the SNR of the output image and this improvement is constant and about 
7 dB for very low SNR images. 
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Figure 2,3 SNR improvement of EKF algorithm. 
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m. FEATURE^BASED ALGORITHMS 



A. PROJECTIONS 
1. General 

Image analysis requires an understanding of how the image was formed. 
How a two-dimensional pattern of brightness is produced in an optical image-forming 
system can be studied in two parts: first, we need to find the geometric correspondence 
between points in the scene and points in the image; then we must figure out what 
determines the brightness at a particular point in the image. The geometric 
correspondences are termed projections. 

Projections transform points in a coordinate system of dimension n into points 
in a coordinate system of dimension less than n [Ref. 12]. Here, we will use the 
projection from 3D to 2D. The projection of a 3D object is defined by straight projection 
rays (called projectors) emanating from a center of projection, passing through each point 
of the object, and intersecting a projection plane to form the projection. 

Projections can be divided into two basic classes: perspective and parallel. 
The distinction is in the relation of the center of projection to the projection plane. If the 
distance from the one to the other is finite, then the projection is perspective. If the 
distance is infinite, the projection is parallel. Figure 3.1 shows these two cases. The 
parallel projection is so named because, with the center of projection infinitely distant, 
the projectors are parallel. When defining a perspective projection, we explicitly specify 
its center of projection; for a parallel projection, we give its direction of projection. 
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The visual effect of a perspective projection is similar to that of photographic 
systems and of the human visual system. The size of the perspective projection of an 
object varies inversely with the distance of that object from the center of the projection 
plane. 

The parallel projection is a less realistic view because perspective 




Figure 3.1 (a) Line AB and its perspective projection, (b) Line AB and its parallel 
projection. 

foreshortening is lacking. The advantages of parallel projection is that the projection can 
be used for exact measurements and parallel lines remain parallel. 
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2. Perspective Projections 

We can approximate an optical imaging system with an ideal pinhole at a 
fixed distance in front of an image plane. Assume that only light coming through the 
pinhole can reach the image plane. Since light travels along straight lines, each point in 
the image corresponds to a particular direction defined by a ray from that point through 
the pinhole (Figure 3.2), 



Figure 3.2 Perspective Projection 

We define the optical axis in this system to be perpendicular to the line 
from the pinhole to the image plane. We can introduce a cartesian coordinate system with 
the origin at the projection pioint and the z-axis aligned with the optical axis and pointing 
toward the image. Let V = pc,Y,Z), the vector connecting O to.4 and V’ = (x,yj), the 
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vector connecting O toA’, with /the focal length, that is, the distance of the image plane 
from nodal point O; and (x,y) are the coordinates of the point A ’ on the image plane in 
the coordinate system with origin at the point of the intersection of the image plane with 
the optical axis, and axes x and y parallel to the axis of the camera coordinate system OX 
and OY. By similar triangles, we can easily write: 



x= 



f^y=E 



(3.1) 



These equations relate the image coordinates to the world coordinates of a point. 
Further, to simplify the equations we may assume / = 1 without loss of generality. 

3. Parallel Projections 

Parallel projections are categorized into two types depending on the relation 
between the direction of projection and the normal to the projection plane. In 
orthographic parallel projections (Figure 3.3), these directions are the same, so the 
direction of the projection is normal to the projection plane. In oblique parallel 
projections, the projection plane is normal but the direction of the projection is different. 
If the direction of projection makes a 45° angle with the projection plane, it is called 
cavalier, if it makes a 63.4° angle, it is called cabinet projection. For our case 
orthographic projection is considered. 

We have a plane that lies parallel to the image plane at Z = in the 
previous perspective projection model. We define the magnification m, as a ratio of the 
distance between two points measured in the image to the distance between the 
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corresponding points on the plane. Consider a small interval on the plane (dX,dY,Of znd 
the corresponding small interval (dx,dy,Of in the image. Then: 



m 




where -Zg is the distance of the plane from the pinhole. The magnification is the same 
for all points in the plane. 

A small object at an average distance -Zg will produce an image that is 
magnified by m. The magnification is approximately constant when the depth range of 



_ ^Jdx'^+dy^ _ f 






( 3 . 2 ) 
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the scene is small relative to the average distance of the surfaces from the camera. In 
this case we can simply write for projection equations, that 

x=mX,y=mY. (33) 

with m = f/(-Z() and -Z^ is the average value of the depth -Z. Often the scaling factor 
m is set to 1 or -1 for convenience. Then we can further simplify the equations to 
become: 

x^X, y=y. (3-4) 

These equations model the orthographic projection model (Figure 3.3), where the rays 
are parallel to the optical axis or the focal length is infinite. 

The difference between perspective and orthographic projection is small when 
the distance to the scene is much larger than the variation in distance among objects in 
the scene. 

B. MOTION EQUATIONS UNDER PERSPECTIVE PROJECTION 

We can analyze the relation between the image plane motion and the corresponding 
three-dimensional motion for the case of perspective projection. For simplicity /(focal 
length) is assumed to be unity and image plane is assumed to be stationary. 

A rigid body with coordinates (X, Y,Zf moves with a translational velocity Vj. = 
(V,V,Wf rotational velocity Q = (A,B,Cf. From kinematics the three-dimensional 
velocity of any point on the surface can be written as: 
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(3.5) 



dt dt dt 



dX dY dZ 



)=Vj.^Clx(X,Y^. 



We can write this equation in component form as: 



X=U+BZ-CY; 



(3.6) 



Y=V+CZ-AZ; 



(3.7) 



Z=W^AY-BX. 



(3.8) 



Let (x,y) denote the coordinates of a point in the image plane. As explained before for 
perspective projection the relation between object point P, and the corresponding image 
point p is: 



The optical flow at each point in the image plane is the instantaneous velocity of 
the brightness pattern at that point. So the optical flow of the point (x,y) can be denoted 
by (u,v) where: 

u=x, v=y. (3- 11) 

Differentiating Equations 3.9 and 3. 10 with respect to time and using Equations 3.6, 3.7 
and 3.8 we obtain: 




(3.9) 




(3.10) 
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(3.12) 



dt Z z Z 

U-xW 2 \ .4 

= +B(1 +x)-Axy-Cy. 

Z 

In the same way: 

v=Z^+5;cy-y4(y2+l)+Cx (3.13) 

We can also write these equations in the form of: 

u=u^+u^y v=Vj+v^ (3.14) 

where (u„vj denotes translational component of the optical flow and the rotational 
component. 

u = U u=B(\+x^)-Axy-Cy, (3.15a) 

\ = ^ v^=Bxy-A(y^+\)+Cx. (3.15b) 

Z 

From Equations 3.12 and 3.13 we can eliminate the unknown depth variable Z. From 
Equation 3.12: 

Zu=U-xW^B{l+x^)Z-AxyZ-CyZ=*Z= , (3. 16) 

u -B( 1 +x^) +i4xy +Cy 



and from Equation 3.13: 



(3.17) 



From these two: 



Z= 



V-yW 

v+A(y^+l)-Bxy-Cx 



U-xW _ u-B{\+x^)+Axy+Cy ^2 ig) 

V-yW v^A(y^+\)-Bxy-Cx 

Equation 3.18 describes the constraint imposed by the measured value of the optical flow 
(u,v) at any image point (x,y) on the six motion parameters (U,V,W,A,B,C). 

Consider one point P = (X, Y,Z) before the motion with image point (x,y). Suppose 
that the point moves with a general motion, and goes to point P’ = (X\ Y’,Z’) with image 
point (x’,y'). Any three-dimensional rigid body motion is equivalent to a rotation by an 
angle 0 around an axis through the origin with directional cosines followed by 

a translation T=(AX,/iY,/iZf. The relation between coordinates of the point before and 
after the transformation is given by: 



{X',Y':zY = R(X,Y^'^+T, 



(3.19) 



where 7? is 3 by 3 orthonormal matrix of first kind (i.e., det(R)=l), and it is defined as 
[Ref. 13], 



R= 



nf+(l-nf)cos0 
-COS0) -/i3sin0 
nj/ijfl -COS0) +/i2sin0 



-COS0) +/i3sin0 
n \ +(1 -/i 2 )cos 0 
-cos0)-njSin0 



nj/i3(l -cos0)-/i2sin0 
/i2n3(l-cos0)+/ijsin0 . 
n3+(l-n^cos0 



(3.20) 



For image vectors the transformation becomes: 
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Z'{x',y',\Y=ZR{x,y,lf^T. 



(3.21) 



If 1 7 II 7^0, where | • || denotes Euiclidean norm, Equation 3.21 can be written: 



^{x',y',lY = R—{x,y,lf + f, 

in'’-'’' mr 



(3.22) 



where T=- 



m 



We can establish a general relationship between the two sets of image coordinates at this 
point. Using this relationship we can then proceed to solve the motion equations and 
obtain the structure of the scene. 

From Equation 3.19 we can write: 



X' 




''ll '•12 ''13 


X 




AX 


Y' 


= 


Tji Tjj 


Y 


+ 


AY 


Z' 




^32 ^33 


Z 




AZ 



and in component form: 

X'=riiX+rj2r+ri3Z+AX; 

Y'=r,^X^r^Y^r,,Z^AY; 

Z'=r 3 iX+r 32 y+r 3 jZ+AZ. 

From the perspective projection property, Equations 3.9 and 3.10: 



(3.23) 



(3.24a) 

(3.24b) 

(3.24c) 
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(3.25) 



Z' TjjX+rjj^+rjjZ (rjjA:+r 32 y+r 33 )Z+AZ’ 

As we did before we can eliminate the depth variable Z from these two equations. 



y/_ (/-2iJ^^^22y^>-23)^^Ay ^6) 

z' TjjX+rjjZ+rjjZ+AZ {r^^x+r^^+r^^Z+M 

From Equation 3.25: 

AX-x^AZ (3.27) 

x'(r3jx+r32y+r33)-(riix+ri2y+ri3) 



From Equation 3.26: 



AY-y'AZ 

yV3iJ^+'-32>’+'’33>-(''21^-"''22>’-"^23) 



(3.28) 



Equating these two equation and solving, we obtain: 

XX ^( A Zt 2 3 - A 1>3 j) +xy ^( A Xr^ j - A Zfj j) +x( A yrj j - A Xf 2 j) + 

yx'(AZr22~ A J>32) +>ry'( AXr3j -AZr^^) +y(A FTjj- AX r22) + 

x'(AZr23~ A Fr33) +y'(AXr^^ ~^Zr^^) +(A Yr^^-AXr^i) =0. (3-29) 

Also from Equation 3.29, it is possible to show this relation as. 
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X 



(3.30) 



[x^ y' l]£ 



= 0 . 



where 



E= 



ej «2 



^4 ^5 ^6 



^7 ^8 ^9 



(3.30a) 



with 



Cj=AZr2j-Ayr3j, e^-=^^Zr^^-^Yry^y e^=^2r^-^Yr^y 

«4=AXr3j-AZrjp e^=LXr^^-LZr^y (3.31) 

C2=AFrjj-AXr3j, e^=^Yr^.^-^Xr^2^ e^=^Yr-^^-LXr^. 

The elements of E are called essential parameters defined in terms of motion parameters 
[Ref. 14]. Equation 3.31 relates image coordinates of feature points and the elements 
of the matrix E. Since these equations are linear and homogeneous in AX, AT, and AZ, 
the essential parameters can be determined up to a scale factor and the sign of the matrix 
E is arbitrary. We can solve for motion parameters from these essential parameters. 
The relative depth (depth scaled by magnitude of translation) of each point can then be 
determined from the motion parameters and the observed projections of the points. Note 
that if we assume motion is only translational then the R matrix will be the identity and 
the matrix E will become: 
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(3.32) 




0 -AZ 
AZ 0 
-AY AX 



ay' 

-AX 

0 



which is a skew-symmetric matrix. 

In general, it is possible to represent the matrix E in the form of: 

E = = [E^R^ E^R^ E^R^] = [E^ E^ £ 3 ], (3.33) 

where R=[Ri R2 RJ- From Equation 3.22 it can be seen that the magnitude of 
translational vector ( || fI ) and absolute depths of the object points (Z and Z') can not be 
determined from monocular vision. Equation 3.22 will still hold when | r|| , Z and Z’ 
are multiplied by any positive constant. Since only the direction of T is known and E 
can be defined up to a scale factor we can normalize things by assuming || r|| =7. This 
implies ||£p = 2 which can be proved: 

II £ P = trace! EE^ ) = trace! Efi(Efi)'^ } 

= trace ! EfiR^E^ } = trace ! Efil } 

= 2(AX^ + AT* + AZ^j = 2. 

From N point correspondences we can establish a matrix A in order to find the 
essential parameters: 



x/i X, y^x[ y,y; y^ x[ y[ 1 
X2X2 x^2 ^2 yih y-iXi >2 ^ y'l ^ 

yA yAn y'n i 



(3.34) 
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If there is no noise, from Equation 3.29 we can write Ae=0, where e is a nine- 
dimensional vector formed by stacking the elements of the matrix E into a column. With 
noise we will try to find a vector h such that ||/4/i|| =min, subject to ||ftp=2. 

Since E has 8 degrees of freedom, we need at least 8 point correspondences to 
solve for E. If we use 8 point correspondences and if the rank of A is 8, then the null 
space is of dimension 1 and h is the vector of normV2 in the null space. The solution 
is the eigenvector of A^A of norm V 2 corresponding to the smallest eigenvalue. Then 
we can produce a solution; first finding the smallest eigenvalue of A^A and then its 
corresponding unit eigenvector h. Then E will be ^2 times this h vector. Degenerate 
cases occur when Rank(A) < 8. If 3D points are coplanar then a degenerate matrix A 
results. 

Note that: £, satisfies the following equations: 









-A FAX 


-AZAX 






11 


-A FAX 




-AFAZ 


. (3.35) 






-AZAX 


-AFAZ 


AX^+AF^, 




and also. 






o 

II 

II 




(3.36) 



Since we estimated E in the previous step, now we can solve the following mean square 
problem in order to find T. 
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Min lE'^Tf Subject to I17p=l. (3-37) 

The solution is the eigenvector corresponding to the smallest eigenvalue of EE^. We can 
estimate the rotation matrix R, by minimizing || E - EjR || with respect to R or we can 



find R directly from the W matrix. Let the matrix W be: 



W=[W^ W^=[E^xE^^E^Ej^ E^E^^E^E^ E^E^^E^xE^. (3.38) 

Using the identity, (axb)xc = (a.c)b-(b.c)a, 



E^xE^+E^E^ =(EpcR^j)xE^+(E^R^j)x(EpcR^j) 

— /fjj — (/?j j.£'j)£'j+(/?j2-(^13^'£j)-E( 

=/?i, -(R^^.E)E^H(R,2>^Rij).E)E, 



This proves that first column of R is EjxE, + E^E^. Similarly other columns can be 
found. 

Using Equation 3.22 we can find the relative depths: 



Find Z=( 



Z' Z 



m imi 

[(x',y',l)^-R(x,y,l)^] 



) such that. 



2' T 

IIT’II 1I7II 



= mm. 



(3.40) 



using standard least square methods. The relative 3D position of any point at time t 2 and 
t, then becomes: 
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(3.41) 



(X',y'^y= 



'imr ^ 1711 i7i" 



(X,y^^= 





C. ERROR ANALYSIS OF THE ALGORITHM 



(3.42) 



The error analysis of the algorithm will be discussed at this section using the 
concepts explained in Reference 2. The feature points used in the algorithm are 
corrupted by noise. The sources of noise are feature detector errors, matching errors, 
quantization errors and system calibration errors. All these errors result in errors in the 
solution for the motion parameters and 3D structure of the scene. The computer 
roundoff errors can be made small enough to be ignored for error analysis. So, the 
primary source of noise is assumed to be the perturbation of image coordinates. 

The errors depend on the motion parameters and the scene structure. First the 
effects on the motion parameters and then the effects on scene structure will be 
discussed. 



1. Motion Parameters 

The error will depend the motion parameters and is discussed at this section. 
The magnitude and the direction of the translation vector and the rotation matrix elements 
will effect the errors that occur in estimating the structure and the motion of the moving 
object. 
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a. Magnitude of Translation 



If the magnitude of translation vector is zero, the estimate of the 
translation direction will be arbitrary and the estimated translation vector will be an 
arbitrary unit vector (since T,xT = 0, where is a unit vector of the same direction 
as the translation). From Equation 3.41 we see that the relative depths of feature points 
can not be determined. When T=0, the rank of A in Equation 3.34 can not be larger 
than 6 [Ref. 15]. R can still be determined in this case by picking up any h satisfying 
\\Ah\\ =min, since E is defined as E=E, x R in Equation 3.33. 

6. Direction of Translation 

Figure 3.4 shows the relation between 3D world coordinates and the 
corresponding image coordinates with motion. As mentioned earlier, motion is 
represented by a rotation followed by a translation. From Equation 3.30 and 3.33 we 
can write that x’Efix = 0, where x’ and x denotes image coordinates at 
respectively. From this relation we can see that T is orthogonal to the cross product 
x'xRx. So the translation vector is determined from the relation T.(x’ x Rx ) = 0. 

Figure 3.5 shows the case where the translational vector is orthogonal 
to the image plane. It can be seen from the figure that the vectors x* x Rx dxe spread 
over the X-Y plane around the origin. This locus is shown by a shaded area in the 
figure. Figure 3.6 shows the case where the translational vector is parallel to the image 
plane. This time x’ x Rx are confined in a small shaded area in the X-Z plane. The 
perturbations in the image coordinates will cause the product a: ’ x Rx to be perturbated 
away from the original positions as denoted by dark disks in the figures. As explained 
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F$X 






Figure 3.4 General Motion Relation 

before, T is determined from T.(x‘ x Rx)=0, so T is orthogonal to n vectors (for n 
correspondences) in the shaded area. Since the shaded area in Figure 3.5 spreads around 
the origin while the shaded area in Figure 3.6 is confined in a small area in one side of 
the origin, Figure 3.5 allows a more reliable estimate of T than Figure 3.6. 

The perturbation of Figure 3.5 will not leave the shaded area often as of 
Figure 3.6. This can be seen as follows. Assume the vector x’ is perturbed in the image 
plane. The perturbation disk is orthogonal to Rx, and almost parallel to the shaded area 
if Rx is near the optical axis. Similarly, the perturbation due to Rx is orthogonal to x’ 
and so it is also nearly parallel to the shaded area if x’ is near the optical axis. Hence, 
such individual perturbations will not cause large errors in the estimation of T. 
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T 




Figure 3.5 Orthogonal Translation. 

In the case of Figure 3.6, the perturbation disk is nearly orthogonal to 
the shaded area if Rx and x’ are near the optical axis as explained before. Therefore the 
perturbations of x' and Rx in Figure 3.6 will cause larger errors in the estimation of T 
than those in Figure 3.5. It is easy to see from Figure 3.6 that the largest perturbation 
of the estimated translation direction is in the Z component. 

Both the shape of the shaded areas and the orientation of the perturbation 
disks imply that a translation orthogonal to the image plane allows more stable estimation 
of translation direction T than a translation parallel to the image plane. 
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T 




Figure 3.6 Parallel Translation. 

c. Rotation Parameters 

The correlation between the rotation and translation is very complicated. 
A rotation about the optical axis is easy to distinguished from translation since no 
translation will give the same displacement field in the image plane. But, a displacement 
field generated by rotation about an axis parallel to the image plane (x or y axis) may be 
nearly the same as that generated by a translation. The differences between these two 
displacement fields are not very large, especially for short displacement vectors or at the 
center of the images. So, the algorithm may easily confuse translation with rotation in 
the presence of noise. As a result, rotations with a rotation axis parallel to the image 
plane are more sensitive to noise than other rotations. However, since the displacement 
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field in the image plane is mainly caused by translation in most cases, the effects of 
translation are more dominant. The rotation matrix R is determined using the translation 
vector T (Equation 3.39). A difficult case for the estimation of translation is therefore 
also a difficult for the estimation of rotation. 

From the above, we can conclude that long displacement vectors will 
generally result in more reliable solutions for rotation parameters than short ones in the 
presence of noise. To yield long displacement vectors, the motion should be large and/or 
scene should be close to image sensor. 

2. Structure of the Scene 

The nine-dimensional unit vector h is determined up to a sign if and only if 
rank of A in Equation 3.34 is equal to 8. A necessary and sufficient condition for the 
rank of A to equal to 8 is given by Lxmguet-Higgins [Ref. 15]. They define as 
’degenerate’ eight-point configurations for which the algorithm fails. A configuration is 
degenerate if as many as four of the points lie in a straight line, or if as many as seven 
of them lie in a plane. Degeneracy also arises if the configuration includes six points at 
the vertices of regular hexagon, or consists of eight points at the vertices of a cube. In 
the presence of noise, the rank of A is generally mathematically full even if the actual 
structure is degenerate. If the structure is degenerate the solution is unreliable. So, in 
the presence of noise, we should consider the numerical condition of the matrix A. 

If the projections of feature points are confined to a small portion of the 
images, only a small portion of the image resolution is used. This will result in a less 
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reliable solution. So, the configuration of the feature points should be such that its 
projection covers as much of the image as possible. 

In the discussion of motion parameters, we see that long displacement vectors 
will result in more reliable solutions. For the same amount of motion, the scene should 
be close to the camera so that it yields long displacement field vectors in the image 
plane. This condition is related to the numerical condition of matrix A. 

A very effective way to reduce the error in the solutions is by using more 
points than the minimally required 8. Since a noise-corrupted image vector may result 
in a large amount of error, it is better to use only reliable feature matches for motion 
parameter estimation. 

The resolution will affect the correctness of image coordinates. So, using 
high resolution will decrease the errors in the algorithm. Assuming resolution and focal 
length to be fixed, reducing the image size by a factor, say 2, will be equivalent to 
doubling the distance from the camera to the scene and reduces the variation in depth of 
the image. This is equivalent to reducing the resolution. So, a reduction of image size 
will worsen the performance of the algorithm. 

The following section presents statistical data obtained through simulations 
that demonstrate the comments made in the discussions. 
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3. Experimental Results 

In the simulations, the feature points are generated randomly within a cube 
of 10 X 10 X 10 units. The object distance is 11 units, the image size is 2 units, and the 
image resolution is stated for each simulation. 

All the errors shown in this section are relative errors. The relative error of 
a matrix or a vector is defined by the Euclidean norm of the error matrix or vector 
divided by the Euclidean norm of the correct matrix or vector. Relative errors are often 
simply referred to as errors, in the following sections. 

a. Simulation for Image Resolution 

Figure 3.7 shows the relation between the errors in the estimates and the 
image resolution. It can be seen that increasing the resolution by a factor 2 roughly 
decreases the errors by a factor 2 which is expected from previous discussion. 

b. Simulation for Number of Corresponding Points 

Figure 3.8 shows the relation between the errors in the estimates and the 
number of corresponding points. It can been seen that an increase in the number of 
corresponding points will decrease the errors as expected. 

c. Simulation for Image Size 

In order to show the effects of decreasing image size, the image size is 
reduced by a factor of 2. To make sure that the same scene is visible, the object is 
moved away from the camera by a factor of 2. The same simulation as in the simulation 
for the number of corresponding points is performed again. From Figure 3.9 it can be 
seen that reducing the image size worsens the estimates which is consistent with earlier 
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EFFECTS OF IMAGE RESOLUTION 




Figure 3.7 Simulation for image resolution, 
discussions. 

d. Simulation for Noise 

In order to see the effects of perturbations in x’ (second frame image 
coordinates) artificial image noises are added to the second frame image coordinates. 
The noise is added to each point in the second frame in a circle of radius r and centered 
at that point. This is done by perturbing the second frame image points from 0 percent 
to 10 percent (with 100 levels). To control the orientation, the sign of normally 
distributed random numbers are used. This experiment was performed several hundred 
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EFFECTS OF NUMBER OF CORRESPONDING POINTS 




Figure 3.8 Simulation for number of corresponding points, 
times. Averaging the results of these experiments Figure 3.10 is obtained. 
e. Simulation for Magnitude of Translation 

In order to see the effect of translation magnitude, a translation vector 
equal to (t, 0, t) is used for simulation. The value of t is changed from 0.5 to 4.5 with 
rotation axis (1, 0, 0) and rotation angle 8°. The results of these simulations is given 
in Figure 3.11. We can see that as the magnitude of translation increases, the error 
levels are decreased. 
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Figure 3.9 Simulation for image size. 

/. Simulation for Direction of Translation 

For this simulation the rotation axis is taken as (1, 0, 0) and the rotation 
angle 8°. The translation vector is taken as (0.5, 0, k) where k is changed from 0 to 2 
using 20 evenly spaced values. The results of the simulations are shown in Figure 3. 12. 
We can see that as the direction of the translation vector becomes orthogonal to the 
optical axis the error levels are decreased. This result is consistent with the previous 
discussion. 
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EFFECTS OF PERTURBATION 
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PERCENT PERTURBATION 

Figure 3.10 Simulation for noise. 

D. MOTION EQUATIONS UNDER ORTHOGRAPHIC PROJECTION 

We can analyze the relation between the image plane motion and the corresponding 
three-dimensional motion for the case of orthographic projection [Ref. 3]. The same 
notation and assumptions will be used as were used for the case of perspective projection. 

For general rigid body motion, Equation 3.19 is still valid. From N image point 
correspondences: 
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EFFECTS OF MAGNITUDE OF TRANSLATION 




Figure 3.11 Simulation for magnitude of translation. 



The problem is to determine R, T, and pCj.Yj.ZJ for N points. For orthographic 
projection from Equation 3.4: 



x=X, y=Y; 
x'=X', y'=Y'. 



(3.45) 



From Equation 3.45 it is obvious that aZ can never be determined and the Z/s can be 
determined only within an unknown additive constant. For the following discussion, 
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EFFECTS OF DIRECTION OF TRANSLATION 




Figure 3.12 Simulation for direction of translation, 
we’ll try to determine: R, aX, aY, X^, Yj, and Zf-Zj/or i=l,2,...,N. 

1. Two-view Case 

We assume that two orthographic views at time instants tj and are taken of 
a rigid object moving in the 3D object space with N point correspondences. We’ll show 
that no matter how large N is, the problem has an infinite number of solutions. First, 
we can decompose the motion from tj to into a rotation R around the point pCj.Yi.Zj) 
followed by a translation: 
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^ 1 = 



(3.46) 



X[-X, 

Y[-Y,. 

Z[-Z, 



Second, to determine R and (Xi,Yi,ZfZj), we can move (X,Y,Z) and PC’,Y’ 
origin; 




Then all other points are related simply by a rotation from tj to 12 '. 



xi 




X 


y'. 


= R 


Yi 


X 




h 



i =2.3,. ..,7V'. 



From Equations 3.48 and 3.45 we can write: 



y'i = 



In matrix notation: 



/ 

















^11 ^12 




+ 


^13 


/ 

y.-j 




^21 ^22. 


y.. 




."■23 



To eliminate premultiply both sides of Equation 3.50 by to got: 



,Z’) to the 



(3.47) 



(3.48) 



(3.49) 



(3.50) 



46 






(3.51) 





23^11 ^13^21 ^23'’l2 "^13^22 



Using the identities; 



^11^23~^2/l3“~^32’ 

^ 12 ^ 23 "^ 22 ^ 13 ~^ 31 » 



12' 23 ' 22' 13 



(3.52) 



which follow from the fact that each row of R is cross product of the other two, we get: 



which is linear and homogenous in the four unknowns rjj, r 2 s, r^j, and From N point 
correspondences, we get (N-1) such equations. Therefore, if N > 4 and assuming that 
the points are not coplanar, we can solve the set of Equations 3.53 to obtain rj^, Tjj, r^j, 
and r ^2 to within a scale factor. In the absence of noise, four point correspondences are 
sufficient to solve Equation 3.53, additional point correspondences are superfluous. The 
important point to notice here is that Equation 3.53 contains all the information we can 
get from the point correspondences. Therefore, no matter how many point 
correspondences we may have, the only thing we can determine about R is the values of 
Ti 3 , T 23 , Tjj, and to within a scale factor. Obviously, by changing the scale factor, we 
can get infinite number of solutions of Tjj, r 2 s, r^j, which satisfies; 



W''i3>'.-^'’32^.-^3iy. = ^ 



(3.53) 




(3.54) 



which follows from the fact that 
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(3.60) 



For any of these infinitely many solutions, we can construct an R. For each solution of 
R, we can use Equation 3.50 to find Z,; for i=2,3,4. Z,’ can be found from: 



whether a set of proposed point correspondences is legitimate. Assume that we have four 
points over two views, then there are 4! = 24 different mapping between the two sets 
of points. For each mapping, we can solve Equation 3.50 to get rjj, r^j, r ^2 to within 
a scale factor. The mapping is permissible if and only if 



orthographic views result in an uncountable infinite number of solutions for motion 
parameters and the structure of rigid objects. We also have derived a simple test for the 
legitimacy of point correspondences [Ref. 3]. 

2. Four Points Over Three View Case 

We assume that the image plane is stationary and that three orthographic 
views at time instants ti, t 2 , and respectively, are taken of a rigid body moving in the 
3-D object space. We again use the notation explained before, except that coordinates 
at will be double primed, and the rotation matrix from to is denoted by: 




(3.61) 



We remark that as a result of the above derivation, we have a way of testing 




(3.62) 



In summary, no matter how many point correspondences we have, two 
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^11 


^12 


^13 


s = 


^21 


^22 


■^23 » 




^31 


^32 


^33 



and the rotation from tj to by: 



>^11 


H-,2 


Wi 3 


^21 


W22 


W23 


W31 


^32 


W33 



Thus, 



W = SR. 



( 3 . 63 ) 



( 3 . 64 ) 



( 3 . 65 ) 



Assuming we are given four point correspondences over three views, again we can let: 



Then 



and 



X 








x'i 




0 


Yi 


= 


1 '.' 


= 


Y'l 


= 


0 


X 












0 



yI 



z .. 



( 3 . 66 ) 



( 3 . 67 ) 
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(3.68) 



xf 

y 1' 


il 

Co 


Yl 









y 



z 



= 

Z: 



i=2.3,4. 



(3.69) 



Using the method explained in the previous section, we can determine 
(s, 3 ,S 23 ,Ssj,Sj 2 ) and (Wjj,W 2 s,Wjj,W 32 ) to within scale factors: 



(^13»^23’^31’^32^ ~ ®(Pl3>P23>P3i>P32)» 

('5i 3>^23’^31’^32^ “ .^(ni3»*^23»*^31»*^32^» (3.70) 

^^13’^23’^31’^32^ ~ Y(^13>^23’^31*^32^' 

where py, Oy, and oiy are known a, j8, and y are unknown constants which are assumed 
to be nonzero. 

Equation 3.65 is the only constraint we have on r,y, Sy, and Wy. We can 

rewrite it: 
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Multiplying out, we get: 
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From Equations 3.73 and 3.74, we can determine a, j 8 , and 7 . Then, we can find R and 
S. Premultiplying both sides of Equation 3.73 by we get: 
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From Equation 3.70: 



.^Y(®23^13~®13^23) ~ .^Cc( ^32Pl3'*’^3lP23)» 



(3.76) 
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g _ °23^13 ^13^23 

Y "®32Pl3'^®3lP23 

Similarly, postmultiplying both sides of Equation 3.74 by 7^ yields: 

fi _ ^3lP32~^32P31 

Y ~‘^3lP23‘^®23Pl3 

Thus, a/y and fi/y are determined if: 
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Next, premultiplying both sides of Equation 3.73 by [ 313 , 823 !, we get: 
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From Equation 3.70 we get: 



^a(a^3C0,3 + O23W23) = -^33^“(03iP,3 + 032P23)+^^(o53 + 023)r33, 



or. 



(0i3<i)i3 + 023<i)23) ~ ~(®3iPi3‘*'^32p23)^33‘''^(^13‘''®23)^33‘ 



Similarly, postmultiplying both sides of Equation 3.74 by [rai.ri 2 /^ yields: 
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(3.86) 



(0)3,P31+<*>32P32) = -=^(<^3lPl3^<^32P23)^33-"-^(p31-"P32)^33- 

A unique solution for {j0/y)% and (ot/yjSjj can be determined from Equations 3.85 and 
3.86 if: 



(5j3+^23) ”(•^31^13 '*^‘^32^23^ 

-(^ 31 ^ 13 -"‘^ 32 '' 23 ) ( 4 + 4 ) 

Then, since a/y and fi/y have already been obtained, we can determine and Sjj 

uniquely. 

From: 
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From Equations 3.79 and 3.80, we have two solutions to a, and 7 . Then, we have 
two solutions for r^j, %• ^ 32 > this moment. For each 

solution, we can determine the remaining elements of R and S by the method described 
below. To find r,j and we may use two properties of the rotation matrix: 



^ 31^11 ■*■^ 32^12 ~ ^ 33 ^ 13 > 



(3.92) 



^32^n-^3l''l2 = -^23- 



(3.93) 



Equations 3.92 and 3.93 denotes two linear equations with two unknowns (rjj and r^j), 
and the coefficient matrix has the determinant: 



^31 ^32 

T32 T 31 

which is assumed to be nonzero. Therefore we obtain a unique solution for tjj, r, 2 . 
Similarly, a unique solution for r 2 i and r 22 can be obtained. Using the same method, s,i, 
s, 2 , S 21 . S 22 can be found. 

Using Equation 3.50, the Z, ’s can be determined for i=2,3,4. As pointed out 
in [Ref. 10], four point correspondences over three frames yield two solutions for motion 
and structure. The point configurations of these two solutions are reflections of each 
other with respect to the image plane. 






(3.94) 
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E. SIMULATION 



Computer generated random numbers are used as feature points when simulating 
the algorithm. The second and the third frame image points are perturbed within a circle 
with radius equal to a varying percent of the original values. This experiment is 
performed several hundred times and the results are plotted on Figure 3.13. 

From Figure 3. 13 we can see that the algorithm is very sensitive to the perturbation 
of the feature points. Even 0.1 percent perturbation is enough to obtain very large 
errors. So the algorithm is valid only for ideal cases and can not be used with low SNR 
images. 



55 



EPFECT or NOISE 




Figure 3.13 Simulation for perturbation. 
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rv. DIFFERENCE IMAGES 



A. GENERAL 

One of the simplest approaches for detecting the changes between two image 
frames taken at different times, is to compare these two images on a pixel by pixel basis. 
One way of doing this is to form a difference image (DI). A difference image is a 
binary image generated by comparing the two frames. The DI is generated by placing 
" 1 " in those pixel positions for which the corresponding pixels in the two frames being 
compared have an appreciable difference in their gray level characteristics. This 
operation is stated as: 






1 

0 



if Wyyj^)-Ax,y,t)\>T, 
Otherwise. 



(4.1) 



where f(x,y,tj),f(x,y,tj) are image frames at times r,- and tj respectively, is the 

difference image and Tis a threshold. The operation is computationally straightforward 
because it involves only the subtraction of corresponding pixels. This approach is 
applicable only if the illumination is relatively constant within the bounds established by 
threshold. 

Difference images alone reveal little information as to the higher level nature of 
scene and sensor change as reflected in the image plane. For example, the difference 
images will reflect the combination of motion effects in the case of several moving 
objects [Ref. 16]. 
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Difference images are very vulnerable to noise. 1-valued entries in in Equation 
4. 1 often arise as a result of noise. The removal of noise may be achieved by forming 
4 or 8 connected regions of Is in and then ignoring any region that has less than a 
predetermined number of entries. Unfortunately, this results in ignoring small and/or 
slow-moving objects [Ref. 17]. 

The concepts explained above are illustrated in Figures 4.1 and 4.2. Figure 4.1 
shows an image frames taken at times ti and tj containing a single object of constant 




Figure 4.1 Image frames at times tj and t^. 
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intensity denoted by Ds. Figure 4.2 shows the difference image computed using 
Equation 4. 1 with a threshold larger than the constant background intensity. Note that 
two disjoint regions Rj and J ?2 in the difference image are generated. The region of 
intensity Rj which is called the leading edge is generated because of "covering" the 
background and R 2 which is called the trail edge is generated because of "uncovering" 
the background over time interval The region between Rj and R 2 is zero because of 



DIFFERENCE IMAGE 




0 2 4 6 8 10 12 14 16 



Figure 4.2 Difference Image. 

the presence of the object in this region in both images and the object has constant 
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intensity. Since the background is assumed to be unchanged in both images the 
background region will also be zero in the difference image. 

B. ACCUMULATIVE DIFFERENCES 

A sequence of images f(x,y,ti),f(x,y,t 2 ), ...,f(x,y,tj are taken at times tj, t„ 
respectively and let f(x,y,tj) be the reference image. An accumulative difference image 
is formed by comparing the reference with every subsequent image in the sequence. A 
counter for each pixel location is incremented every time there is a difference at that 
pixel location between the reference and an image in the sequence. Thus, when the k* 
frame is being compared with the reference, the entry in a given pixel of the 
accumulative image gives the number of times the gray level at that position has been 
different from the corresponding pixel value in the reference image. At each time, 
differences are established by using Equation 4.1 then they are summed up to generate 
the accumulative image. 

These concepts are illustrated in Figure 4.3. A rectangular object has been placed 
at the upper left comer of the picture then moved to the right and down at a constant 
velocity of 1 pixel/frame (column velocity) and 2 pixel/frame (row velocity). Figure 4.3 
shows the corresponding accumulative difference images for the 2nd and 4th frames 
respectively. The velocity of the object can be estimated from the repetition times of the 
homogeneous rows (every element in the row is equal to each other) and homogeneous 
columns (every element in the column is equal to each other) in the accumulative image. 
From Figure 4.3 we see that there are one homogeneous column and two homogeneous 
rows at each time. This shows that the column velocity is 1 and row velocity is 2. 
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ACCUMUI-ATJVC IMAOC AT T1MC:t2 




ACCUMUI_A-nVC IMAGE AT TIMEttA 




Figure 4.3 Accumulative Difference Images. 



Finding velocity from the accumulative image becomes very hard if noise is present 
in the image since some pixel counters may be incremented because of the noise. This 
effect can be reduced using 4 or 8 connected regions as explained previously. Also, it 
can be seen from Equation 4. 1 that selecting the proper threshold value is very important 
for estimating the correct object velocities. This is illustrated in simulation section. 
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The object velocity can also be estimated from the changes in the center of the area 
that the object covers at each picture. This can be accomplished by obtaining binary 
images for each frame. This binary image algorithm is explained in the next section. 
C. BINARY IMAGES AND VELOCITY ESTIMATION 

Binary (two-valued or black-and-white) images are obtained by setting pixel values 
to "0" for all image points corresponding to the background and setting pixel values to 
"1" for all the image points on the object. If the object appears consistently darker (or 
brighter) than the background, it is easy to generate the binary images, since generating 
binary images simply requires thresholding the pixel values. If the object has similar 
gray-level values as the background and/or noise is corrupting the image, the 
thresholding procedure becomes harder. As explained in Reference 5, Histogramming 
or Segmentation methods should be applied in this case to obtain the binary image. 

Assuming the image has been digitized with n rows and m columns, the area of the 
object can be computed in units of the area of a picture cell: 

n m 
i=i y=i 

where py is the value of binary image at the i* row and j* column. The x-position of 
center of the area (COA) can be found by using: 
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(4.3) 



1 " 



A 

where it, is the x-projection of the image, which gives for each column the number of 
picture cells in the column that have the value one. Similarly, the y-position of the COA 
can be found using: 




(4.4) 
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Figure 4.4 Binary Image, X and Y Projections. 
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where hj is the y-projection of the image, which gives for each row the number of picture 
cells in the row that have the value one. These concepts are illustrated in Figure 4.4. 

Assuming the object is rigid and each image is generated by using orthographic 
projection, the x-component of the object velocity can be found simply from: 






^x2 ^xl 

h-h ’ 



(4.5) 



where A^ 2 > ^xi the x-components of CO As at time and ti, respectively. Similarly, 
y-component of the velocity can be found from: 



(4.6) 

where A ^2 and Ayi are the y-components of COAs at time and ti, respectively. 

Using the algorithms explained in this chapter, some experiments has been 
performed with real images and computer generated images. The simulations and the 
results are presented in the next section. 

D. SIMULATION 

The concepts explained for accumulative differences are applied to a computer 
generated square object. The object has been moved at a rate of 5 pixels/frame right and 
5 pixels/frame down. The accumulative difference for 8 frames is obtained as explained 
above. The resultant image is shown on Figure 4.5 from which it is possible to estimate 
the object velocity since no noise is added and the object has a regular shape. If the 
image has noise, and/or has irregular shape, it is very hard to estimate the velocity of 
the object. To demonstrate this the car image is used which is shown in Figure 4.6. 
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Figure 4.5 Computer generated accumulative difference image. 



The accumulative difference for this image for 3 frames is shown on Figure 4.7. Since 
the background is constant for three frames, it is eliminated except for the regions the 
car covers or uncovers. Velocity estimation is very hard in that case. The only thing 
we can obtain is the type of motion, [Ref. 17]. So, we can say that this algorithm is 
adequate only for an "early warning system" which detects only changes and the 
direction of the motion. 

The concepts explained for binary images are simulated using computer generated 
binary images. A square object has been moved through the image plane and artificial 
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Figure 4.6 Car image. 



image noise is added at each frame. To reduce the effect of noise, a thresholding 
process is applied to the frames, then only nonzero four-connected pixel regions are used 
to estimate the motion using the changes in the COA. The relative errors for different 
SNR are obtained. Figure 4.8 shows the result of the first experiment with a threshold 
of 0.5. Figure 4.9 shows the result of another experiment with a threshold 0.25. 

From Figures 4.8 and 4.9, we can see that as we increase the threshold the relative 
error becomes smaller, but we may ignore the small changes and slow moving objects 
which is consistent with the previous discussion. The error levels for high SNR may be 
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Figure 4.7 Accumulative difference car image, 
acceptable for some specific applications, but for low SNR the relative errors are very 
high and this algorithm does not provide a good velocity estimate. 
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Figure 4.8 Simulation result with threshold 0.50. 
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Figure 4.9 Simulation result with threshold 0.25 
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V. CONCLUSIONS 



The performance of the EKF algorithm, linear feature-based algorithms, and the 
differencing algorithm are evaluated on low SNR images after outlining the algorithms. 

Chapter II presents the EKF algorithm which is implemented in the spatial 
frequency domain. Implementing the EKF in the spatial frequency domain decreases the 
required computations greatly with respect to a straight forward implementation of the 
EKF [Ref. 7]. Estimating the object velocity from the frequency-velocity product 
necessitates the use of more image frames than the other algorithms discussed in this 
thesis. 

The EKF algorithm assumes constant background and provides a two-dimensional 
nonlocalized velocity vector estimate. Reference 1 1 summarizes the performance of the 
algorithm using zero background or a checkerboard background, at varying noise levels. 
It is shown that as the noise level increases and/or the object in the image plane does not 
have constant image brightness, such as a pyramid object, the number of iterations 
required to converge to the true velocity increases greatly. 

For low SNR images, the EKF increases the image quality greatly, as shown in the 
simulation part of Chapter II. This technique produces much better velocity estimates 
for low SNR images than the other algorithms. 

Although a detailed overview of the algorithms based on optical flow are not 
presented in this thesis, some of the results of Reference 6 will be outlined for 
comparison purposes. Optical flow algorithms restrict the motion to be smooth and small 
thus requiring a high rate of image acquisition. It also requires that motion vary 



70 



continuously over the image plane. These two restrictions are effected by object 
occlusion and initial and boundary conditions. Since this algorithm mainly relies on first 
and second order derivatives of image brightness values, the noise is enhanced during 
these operations which results in more sensitivity to noise. This algorithm does not 
provide usable results for low SNR images. 

Feature-based approaches, as discussed in Chapter III, strictly require that 
correspondence be established between image frames. It is the author’s belief that much 
work should be done in this area to improve the performance of these algorithms. It is 
shown in Chapter III that even small relative perturbations of feature points can result 
in large relative errors. Noise and occlusion worsen the establishment of 
correspondences between features and decrease the performance of the algorithms. One 
way of decreasing the sensitivity to noise is to use more than the required number of 
features in least squares technique. This can have a smoothing effect but it may also 
cause additional complications. For example, the computation time is increased for the 
establishment of correspondences. For ideal cases these algorithms produce very 
desirable results. These are the only algorithms compared in this thesis that provide 3 
dimensional velocity and structure estimation. This characteristic is the main advantage 
of these algorithms. 

The accumulative differencing algorithm provides computationally straightforward 
motion estimation. Unfortunately only changes in the image scene and the direction of 
the motion can be estimated with real images. Differencing the images iteratively 
reduces the SNR of images processed by the algorithm which results in poor estimation. 
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Using the changes in the COA of binary images also provides a straightforward motion 
analysis method, but the thresholding and the histrogramming techniques used by the 
algorithm are vulnerable to noise. In Chapter IV, it is shown that those algorithms do 
not provide good motion estimation for low SNR images. 

The preference of the motion estimation algorithm generally depends on the kind 
of application and the expected SNR of images. For high SNR images, feature-based 
algorithms are preferable; they provide three-dimensional information on motion and 
structure. The perspective projection method can provide realistic results which are 
similar to those obtained with the human visual system. If the object distance is large, 
then the variations in the distance to the object points can be ignored and orthographic 
projection is a good approximation to perspective projection. 

Some specific applications such as "early warning systems" may require only the 
detection of changes in the image plane or low level motion estimation. Then, because 
of its computational simplicity, differencing algorithms may be preferable. But, for low 
SNR images, simulations have shown that the EKF provides the best motion and 
structure estimation. The EKF also enhances the image quality at low SNR. So, for low 
SNR images, the EKF algorithm is preferable. For future work, the EKF algorithm can 
be improved to also estimate rotational motions and the depth of the object. 
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